- 
                Notifications
    You must be signed in to change notification settings 
- Fork 697
Improve header chain extension performance #1891
Improve header chain extension performance #1891
Conversation
        
          
                eth/db/chain.py
              
                Outdated
          
        
      | if current_canonical_head and header.parent_hash == current_canonical_head: | ||
| cls._add_block_number_to_hash_lookup(db, header) | ||
| db.set(SchemaV1.make_canonical_head_hash_lookup_key(), header.hash) | ||
| return (header,), tuple() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, doing this optimization makes sense, but the logic duplication (adding block number lookup, and setting the canonical head hash) concerns me. If something updates further down in the method, it would be really easy to forget to do it here.
One alternative:
- In the if block, only set new_canonical_headers, old_canonical_headers = (header,), ()
- Move the following 3 lines of {old,new}_canonical_headersgeneration into the else block
It does one extra db lookup at the _get_canonical_block_hash to check for an old header.
If you think that extra lookup is critical, and worth some more PR thrash, then you could move the for h in new_canonical_headers block into a cls._decanonicalize_old_headers(db, new_headers) call, inside the else block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, great point! I don't think the extra lookup matters, I'll profile it to make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly enough the extra lookup added ~9% to import time. I pushed a change which keeps it out but also removes some duplicated code.
        
          
                eth/db/header.py
              
                Outdated
          
        
      | def _get_canonical_head_hash(cls, db: DatabaseAPI) -> Hash32: | ||
| try: | ||
| canonical_head_hash = db[SchemaV1.make_canonical_head_hash_lookup_key()] | ||
| return cast(Hash32, db[SchemaV1.make_canonical_head_hash_lookup_key()]) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Christoph was recommending doing the following instead. I guess type-checking is supposed to notice if you send a non-bytes value this way (but not with cast):
| return cast(Hash32, db[SchemaV1.make_canonical_head_hash_lookup_key()]) | |
| return Hash32(db[SchemaV1.make_canonical_head_hash_lookup_key()]) | 
YMMV, do what you like :)
        
          
                eth/db/header.py
              
                Outdated
          
        
      | @classmethod | ||
| def _get_canonical_head(cls, db: DatabaseAPI) -> BlockHeaderAPI: | ||
| canonical_head_hash = cls._get_canonical_head_hash(db) | ||
| return cls._get_block_header_by_hash(db, Hash32(canonical_head_hash)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Hash32 wrapper looks redundant.
| previous_score = score - header.difficulty | ||
| cls._set_hash_scores_to_db(db, header, previous_score) | ||
| cls._set_as_canonical_chain_head(db, header.hash, header.parent_hash) | ||
| cls._set_as_canonical_chain_head(db, header, header.parent_hash) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay that this change has so straightforward, and didn't require a new header lookup anywhere!
| Just addressed the rest of the outstanding PR feedback, ready for another review :) | 
| The lint failure is... interesting. My first guess is that mypy just upgraded, and circleci doesn't pin the version? EDIT: nope, mypy is on the right version, 0.701 | 
| @lithp interesting indeed! I looked into it. When installed with  But when we install with  So, my guess is something in  | 
| 
 Ahah! I was wondering why I could only reproduce linting issues with local tox runs, on certain issues. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, yeah the refactor looks good. I think it leans a but further into subclassing vs component architecture, but fixing that would be a giant change unrelated to this one.
👍 when linting passes
| try: | ||
| for transaction_hash in cls._get_block_transaction_hashes(db, old_header): | ||
| transaction_hashes = cls._get_block_transaction_hashes(db, old_header) | ||
| for transaction_hash in transaction_hashes: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This split was just to do fewer things on one line, right? (I'm fine with it, just wondering if there was something subtle I was missing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, no subtle reason, the change was to get us under the 90-character line width limit
| Woah, thanks for looking into it @cburgdorf, that helped a lot! | 
Also make an unrelated change which makes circleci happy, it uses an updated version of eth-utils.
6392266    to
    92e575b      
    Compare
  
    
What was wrong? / How was it fixed?
Breaking this out from #1888, improves performance if the parent of the header being imported is the canonical chain tip.
_find_new_ancestorsis relatively expensive; if I remember right a lot of time is spent fetching and de-serializing the header which was just imported, when all that's wanted is its hash.To-Do
Cute Animal Picture